Comparative Evaluation of Four Different Sensitive Tabular Data Protection Methods Using a Real Life Table Structure of Complex Hierarchies and Links
نویسندگان
چکیده
The practitioners of tabular data protection methods in national statistical agencies have some familiarity with commonly used table structures. However, they require some guidance on how to evaluate appropriateness of various sensitive tabular data methods when applied to their own table structure. With that in mind, we use a real life " typical " table structure of moderate hierarchical and linked complexity and populate it with synthetic micro data to evaluate the relative performance of four different tabular data protection methods. The methods selected for the evaluation are: 1) lp-based classical cell suppression 2) lp-based CTA (Dandekar 2001), 3) network flow-based cell suppression as implemented in DiAna, a software product made available to other Federal statistical agencies by the US Census Bureau and 4) a micro data level noise addition method documented in a US Census Bureau research paper. The outcome from the comparative evaluation is available from
منابع مشابه
Combining Extended Table Lens and Treemap Techniques for Visualizing Tabular Data
We present a framework for visualizing large tabular data that combines two views: the table view and the treemap view. The table view extends the known table lens as follows: We cluster related elements to reduce subsampling artifacts and achieve table size independent rendering time; we use multiple-column sorting to create scenariospecific data hierarchies on the fly; and we use shaded cushi...
متن کاملA Computational Evaluation of Optimization Solvers for CTA
Minimum-distance controlled tabular adjustment methods (CTA), an its variants, are considered an emerging perturbative approach for tabular data protection. Given a table to be protected, the purpose of CTA is to find the closest table that guarantees protection levels for the sensitive cells. We consider the most general CTA formulation which includes binary variables, thus providing protected...
متن کاملStatistical disclosure control in tabular data
Data disseminated by National Statistical Agencies (NSAs) can be classified as either microdata or tabular data. Tabular data is obtained from microdata by crossing one or more categorical variables. Although cell tables provide aggregated information, they also need to be protected. This chapter is a short introduction to tabular data protection. It contains three main sections. The first one ...
متن کاملMinimum-distance controlled perturbation methods for large-scale tabular data protection
National Statistical Agencies routinely release large amounts of tabular information. Prior to dissemination, tabular data needs to be processed to avoid the disclosure of individual confidential information. One widely used class of methods is based on the modification of the table cells values. However, previous approaches were not able to preserve the values of the marginal cells and the add...
متن کاملMathematical Programming Models for Balancing Data Quality and Confidentiality in Tabular Data
1. Mathematical Programming Model for Controlled Tabular Adjustment (CTA) Statistical agencies use different methods to protect the confidentiality of tabular data. The most widely used method, complementary cell suppression, suppresses both primary (sensitive) and secondary (non-sensitive cells) to assure confidentiality. Despite its popularity, it suffers from severe limitations. Complementar...
متن کامل